AD-A267  135  1 

ifiiiiniiiiiiiiiiiiiiiiiiiHii 


TO  APPEAR:  12TH  SYMPOSIUM  ON  RELIABLE  DISTRIBUTED  SYSTEMS, 
OCTOBER  6-8,  1993 


PRINCETON,  HJ 


Lazy  Checkpoint  Coordination  for  Bounding  Rollback  Propagation 

*  iU 


Yi-Min  Wang  and  W.  Kent  Fuchs 
Coordinated  Science  Laboratory 
University  of  Illinois  at  Urbana-Champaign 


Abstract 

In  this  paper,  we  propose  the  technique  of  lazy  check¬ 
point  coordination  which  preserves  process  autonomy 
while  employing  communication-induced  checkpoint  co¬ 
ordination  for  bounding  rollback  propagation.  The  no¬ 
tion  of  laziness  is  introduced  to  control  the  coordination 
frequency  and  allow  a  flexible  trade-off  between  the  cost 
of  checkpoint  coordination  and  the  average  rollback  dis¬ 
tance.  Worst-case  overhead  analysis  provides  a  means  for 
estimating  the  extra  checkpoint  overhead.  Communication 
trace-driven  simulation  for  several  parallel  programs  is 
used  to  evaluate  the  benefits  of  the  proposed  scheme. 

1  Introduction 

Uncoordinated  checkpointing  [1-3]  for  parallel  and  dis¬ 
tributed  systems  allows  maximum  process  autonomy  and 
independent  design  of  recovery  capability  for  each  pro¬ 
cess.  However,  in  a  general  nondeterministic  execution, 
cascading  rollback  propagation  may  result  in  the  domino 
effect  [4]  which  can  prevent  progression  of  the  recovery 
line.  It  has  been  shown  that  message  reordering  [5]  and 
message  logging  [3]  can  effectively  reduce  rollback  prop¬ 
agation.  In  order  to  entirely  eliminate  the  possibility  of 
domino  effects,  extra  checkpoints  need  to  be  taken  based 
on  the  communication  history.  Kim  etal.  [6]  and  Venkatesh 
et  al.  [7]  employ  transitivedependency  tracking  and  insert  a 
checkpoint  before  processing  any  message  that  introduces 
a  new  dependency.  Russel  [8]  proves  that,  by  inserting  a 
checkpoint  between  every  pair  of  consecutive  send  and 
receive  events  (in  that  order),  domino-free  recovery  is 
ensured.  The  log-based  approach  [9-17]  assumes  thepiece- 
wise  deterministic  execution  model  [12]  where  a  process 
execution  consists  of  a  number  of  deterministic  state  inter¬ 
vals,  each  started  by  a  nondeterministic  event.  It  has  been 
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shown  that  logging  a  nondeterministic  event  equivalently 
places  a  logical  checkpoint  [18]  at  the  end  of  the  ensuing 
state  interval,  and  these  extra  logical  checkpoints  serve  to 
eliminate  the  domino  effect. 

Coordinated  checkpointing  achieves  domino-free  recov¬ 
ery  by  sacrificing  a  certain  degree  of  process  autonomy  and 
incurring  run-time  and  extra  message  overhead.  Usually, 
whenever  a  checkpoint  is  initiated  by  one  process,  all  the 
other  processes  are  informed  and  required  to  take  appro¬ 
priate  checkpoints  in  order  to  guarantee  the  resulting  set  of 
checkpoints  is  consistent  [19-24], 

We  will  use  the  term  eager  checkpoint  coordination  for 
the  coordination  action  performed  when  checkpoints  are 
initiated,  as  described  above.  In  contrast,  processes  in  a 
system  with  lazy  checkpoint  coordination  only  coordinate 
their  corresponding  checkpoints  when  message  communi¬ 
cation  indicates  a  violation  of  checkpoint  consistency.  Bri- 
atico  et  al.  [25]  force  the  receiver  of  a  message  m  to  take  a 
checkpoint  before  processing  m  if  the  sender's  checkpoint 
interval  number  tagged  on  m  is  greater  than  that  of  the 
receiver.  Checkpoints  with  the  same  ordinal  numbers  are 
therefore  always  guaranteed  to  be  consistent.  However,  the 
run-time  overhead  may  be  high  due  to  the  possibly  exces¬ 
sive  number  of  extra  induced  checkpoints.  In  this  paper,  we 
generalize  the  concept  of  communication-induced  check¬ 
point  coordination  by  introducing  the  notion  of  laziness  Z 
as  a  measure  of  the  frequency  for  performing  coordination. 
Only  corresponding  checkpoints  with  ordinal  numbers  nZ. 
where  n  is  an  integer,  are  required  to  be  consistent  with  each 
other  for  bounding  rollback  propagation.  Overhead  anal¬ 
ysis  shows  that  our  generalization  can  significantly  reduce 
the  number  of  extra  checkpoints  compared  to  the  previous 
work  [25]  which  corresponds  to  the  case  of  Z  =  1 . 

2  Checkpointing  and  Rollback  Recovery 

The  system  considered  in  this  paper  consists  of  a  number 
of  concurrent  processes  for  which  all  process  communica¬ 
tion  is  through  message  passing.  Processes  are  assumed  to 
run  on  fail-stop  processors  [26]  and.  for  the  purpose  of  pre¬ 
sentation,  each  process  is  considered  an  individual  recovery 
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unit.  In  order  to  allow  general  nondeterministic  execution, 
we  do  not  assume  a  piecewise  deterministic  model.  This 
implies  whenever  the  sender  of  a  message  m  rolls  back 
and  unsends  m,  the  receiver  which  has  already  processed 
m  must  also  roll  back  to  undo  the  effect  of  m  because  the 
potential  nondeterminism  preceding  the  sending  of  m  may 
prevent  the  same  message  from  being  resent  during  reex¬ 
ecution.  Let  ci  x  denote  the  rth  checkpoint  (x  >  0)  of 
process  pi  (0  <  i  <  N  -  1),  where  N  is  the  number  of 
processes  in  the  system.  Two  checkpoints  c;  *  and  cJ  V  are 
then  considered  inconsistent  if  there  is  any  message  sent  af¬ 
ter  cJ  y  and  processed  before  ct  I,  or  vice  versa.  In  contrast, 
when  the  receiver  o  f  a  message  m'  rolls  back  and  unrecei  ves 
m\  the  sender  needs  not  roll  back  to  unsend  m!  if  m'  can 
be  retrieved  from  a  message  log  [3, 1 1, 12, 27]  or  through  a 
reliable  end-to-end  transmission  protocol  [14, 22]. 

During  normal  execution,  each  process  periodically  and 
independendy  saves  its  state  as  a  checkpoint  on  stable  stor¬ 
age.  The  interval  between  c,  r  and  ct,I+i  is  called  the  zth 
checkpoint  interval  of  pt  .  Each  message  is  tagged  with  the 
current  checkpoint  interval  number  and  the  process  number 
of  the  sender,  and  each  receiver  p,  performs  direct  depen¬ 
dency  tracking  [1,28]  as  follows:  if  a  message  sent  from 
(j,  y)  is  processed  in  (i,  x),  the  direct  dependency  of  ctr+1 
on  Cj  y  is  recorded. 

A  garbage  collection  procedure  can  be  periodically  in¬ 
voked  by  any  process  pi.  First,  p,  collects  the  direct 
dependency  information  from  all  the  other  processes  to 
construct  the  checkpoint  graph  [1]  as  shown  in  Fig.  1(b). 
Then  the  rollback propagationalgorithm  (Fig.  2)  is  applied 
to  the  checkpoint  graph  to  determine  the  global  recovery 
line1  (black  vertices),  before  which  all  the  checkpoints  are 
obsolete  and  can  be  discarded.  Alternatively,  an  optimal 
garbage  collection  algorithm  [29]  can  be  used  to  minimize 
the  space  overhead  by  discarding  all  the  garbage  check¬ 
points  marked  "X”  in  Fig.  1(b). 

When  any  process  initiates  a  rollback,  it  starts  a  similar 
procedure  for  recovery.  The  current  volatile  states  of  the 
surviving  processes  are  treated  as  additional  virtual  check¬ 
points  [2]  for  constructing  an  extended  checkpoint  graph 
of  which  the  recovery  line  is  called  the  local  recovery  line 
(shaded  vertices)  and  indicates  the  consistent  rollback  state. 


3  Lazy  Checkpoint  Coordination 
3.1  Motivation 

We  will  refer  to  the  checkpoints  initiated  independently 
by  each  process  as  basic  checkpoints  and  those  triggered  by 

2  The  global  recovery  line  is  to  be  used  when  the  entire  system  fails, 
while  a  local  recovery  line  is  computed  when  only  a  subset  of  processes 
becomes  faulty. 
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(a) 


Figure  1:  Checkpointing  and  rollback  recovery,  (a)  exam¬ 
ple  checkpoint  and  communication  pattern;  (bi  checkpoint 
graph  and  extended  checkpoint  graph  when  po  initiates  a 
rollback. 


/*  CP  represents  a  checkpoint  */ 

/*  Initially,  all  the  CPs  are  unmarked  */ 

Include  the  latest  CP  of  each  process  in  the  root 
set; 

Mark  all  CPs  strictly  reachable  from  any  CP  in 
the  root  set; 

While  ( at  least  one  CP  in  the  root  set  is  marked) 

{ 

Replace  each  marked  CP  in  the  root  set  by  the 
latest  unmarked  CP  of  the  same  process: 

Mark  all  CPs  strictly  reachable  from  any  CP  r 

in  the  root  set  }  — - - 

The  root  set  is  the  recovery  line.  ® 

Figure  2:  The  rollback  propagation  algorithm. 
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the  communication  as  induced  checkpoints.  Fig.  3(a)  illus¬ 
trates  a  situation  where  the  communication  pattern  renders 
most  of  the  basic  checkpoints  useless  for  rollback  recovery 
and  the  global  recovery  line  stays  at  the  very  beginning 
of  the  execution.  A  straightforward  way  of  avoiding  such 
possibly  unbounded  rollback  propagation  is  to  perform  ea¬ 
ger  checkpoint  coordination  as  shown  in  Fig.  3(b)  where 
bi  x  denotes  the  ith  basic  checkpoint  of  p,.  Whenever  a 
process  takes  a  basic  checkpoint,  coordination  messages 
(dotted  lines)  are  broadcast  to  request  the  cooperation  in 
making  a  consistent  set  of  checkpoints  [19].  Let  B  be  the 
total  number  of  basic  checkpoints  and  I  be  the  total  number 
of  induced  checkpoints.  We  define  the  induction  ratio  as 


which  is  a  measure  of  the  overhead  for  performing 
communication-induced  checkpoint  coordination.  Clearly, 
eager  checkpoint  coordination  has  TZ  =  N  —  \  and  will 
result  in  large  run-time  overhead  when  N  is  large.  In  ad¬ 
dition,  the  N  -  1  coordination  messages  per  checkpoint 
session  constitute  another  overhead. 

The  large  overhead  of  eager  checkpoint  coordination  re¬ 
sults  from  its  pessimistic  nature.  More  specifically,  when 
Pi  in  Fig.  3(b)  initiates  its  first  basic  checkpoint  61,1,  it 
“pessimistically”  assumes  that  messages  like  mi  will  exist 
in  the  future  and  cause  61 ,  i  to  be  inconsistent  with  its  corre¬ 
sponding  checkpoint  60,i  on  po.  In  order  to  guarantee  611 
belongs  to  a  useful  recovery  line,  pi  “eagerly”  requests  po’s 
cooperation  at  the  time  61  ,i  is  initiated.  In  contrast,  lazy 
checkpoint  coordination  adopts  an  optimistic  approach  by 
assuming  that  60,i  will  be  consistent  with  6],i .  If  the  as¬ 
sumption  turns  out  to  be  true,  no  explicit  coordination  is 
necessary.  An  extra  checkpoint  will  be  induced  on  po  only 
when  message  mi  indicates  that  the  assumption  has  failed 
(Fig.  3(c)).  From  another  point  of  view,  such  a  scheme 
“lazily”  delays  the  broadcast  of  the  coordination  messages 
and  implicitly  piggybacks  them  on  future  normal  messages 
[21  ].  Both  checkpoint  and  message  overhead  can  therefore 
be  reduced. 

However,  given  a  basic  checkpoint  pattern,  the  number 
of  induced  checkpoints  in  the  above  scheme  is  determined 
by  the  communication  pattern  and  is  not  otherwise  control¬ 
lable.  In  the  worst  case,  the  induction  ratio  7 Z  can  still  be 
/V  -  1  as  illustrated  in  Fig.  3(c).  In  order  to  further  reduce 
the  overhead,  we  can  perform  even  “lazier”  coordination  by 
only  enforcing  the  consistency  between  checkpoints  co,„z 
and  ci  „z  where  Z  is  again  the  laziness  and  n  is  an  inte¬ 
ger.  Fig.  3(d)  shows  the  case  of  Z  =  2.  No  checkpoint  is 
induced  until  the  message  m2  indicates  the  inconsistency 
between  61 2  and  60, 2.  The  number  of  induced  checkpoints 
is  then  reduced  from  8  to  2  at  the  cost  of  potentially  larger 
rollback  distance. 


Induced 


(d) 

Figure  3:  Communication-induced  checkpoint  coordina¬ 
tion.  (a)  checkpoint  and  communication  pattern;  (b)  eager 
checkpoint  coordination;  (c)  lazy  checkpoint  coordination 
with  laziness  =  1;  (d)  lazy  checkpoint  coordination  with 
laziness  =  2. 


3.2  The  Protocol 


Our  approach  is  to  incorporate  lazy  checkpoint  coordi¬ 
nation  into  the  uncoordinated  checkpointing  scheme  as  a 
mechanism  for  bounding  rollback  propagation.  Therefore, 
the  checkpointing  and  recovery  protocol  can  be  built  on  top 
of  the  one  described  in  Section  2.  The  laziness  Z  is  a  prede¬ 
termined  system  parameter  known  to  all  processes.  During 
normal  execution,  each  process  p,  maintains  a  variable  V 
which  is  initialized  to  be  Z  and  incremented  by  Z  each 
time  C\  nZ  is  taken.  When  p,  at  its  xth  checkpoint  interval 
is  about  to  process  a  message  m  tagged  with  the  sender  py ’s 
checkpoint  interval  number  y  >  V,  p,  is  forced  to  take  the 
checkpoint  Cj  /z  where  /  =  \_y/Z\  .  In  other  words,  if  m 
was  sent  after  c]iz  had  been  taken,  it  must  be  processed 
by  pi  after  c,/z  is  induced.  Notice  that  all  the  checkpoints 
clw  with  x  <  w  <  IZ  become  dummy  checkpoints  which 
overlap  with  Ci  iz . 

In  addition  to  the  centralized  garbage  collection  pro¬ 
cedure  as  described  in  Section  2,  a  simple  distributed  al¬ 
gorithm  can  also  be  used  for  low-cost  garbage  collection. 
The  basic  idea  is  that  if  the  current  checkpoint  interval 
number  of  every  process  has  exceeded  nZ,  all  the  check¬ 
points  Cj  y  with  y  <  nZ  become  obsolete  with  respect  to 
the  consistent  set  of  checkpoints  {cj  „z  :  0  <  i  <  N  -  1 } 
and  therefore  can  be  discarded.  Each  process  p;  needs  to 
maintain  a  checkpointing  progress  vector  CP-progress[N] 
which  records  the  highest  checkpoint  interval  number  of 
every  process  known  to  py  based  on  the  information  in¬ 
cluded  in  each  message.  More  efficient  garbage  collec¬ 
tion  can  be  achieved  by  periodically  piggybacking  the 
CP-progress[N]  vector  on  normal  messages. 

Although  {c<  „z  :  0  <  i  <  N  -  1}  always  forms  a 
consistent  set  of  checkpoints,  the  two-phase  recovery  pro¬ 
cedure  described  in  Section  2  should  still  be  used  to  search 
for  the  local  recovery  line  in  order  to  minimize  the  num¬ 
ber  of  rolled-back  processes  and  the  rollback  distances. 
One  possible  optimization  is  that  the  dependency  informa¬ 
tion  associated  with  the  garbage  checkpoints  determined 
locally  based  on  CP-progress[N]  needs  not  be  collected, 
thus  reducing  the  size  of  the  checkpoint  graph. 

4  Overhead  Analysis 

Since  the  checkpoint  overhead  of  the  lazy  checkpoint  co¬ 
ordination  scheme  depends  on  the  run-time  dynamic  com¬ 
munication  pattern,  it  is  important  to  analyze  and  estimate 
the  potential  extra  overhead  resulting  from  the  induced 
checkpoints.  We  will  first  show  that,  without  any  con¬ 
straints  on  the  relative  checkpointing  progress  of  each  pro¬ 
cess,  the  worst-case  induction  ratio  is  (N  —  1  )/Z.  While 
under  certain  conditions  which  are  typically  met  by  real 


applications,  the  upper  bound  on  the  induction  ratio  can  be 
shown  to  be  independent  of  N . 

4.1  Worst-Case  Analysis 

Our  approach  to  worst-case  analysis  consists  of  two 
steps.  First,  given  any  basic  checkpoint  pattern,  we  con¬ 
struct  the  worst-case  communication  pattern.  Secondly, 
given  any  system  with  N  processes  and  laziness  Z,  we  de¬ 
rive  the  worst-case  induction  ratio  as  a  function  of  N  and  Z 
by  considering  these  worst-case  communication  patterns. 

For  the  purpose  of  presentation,  we  assume  every  check¬ 
point  cfx  in  a  checkpoint  and  communication  pattern  V 
is  associated  with  a  global  time  stamp  t(cfx).  For  any 
n,  define  c*nZ  =  c?nZ  if  t(cfnZ)  <  t(cfnZ)  for  all 
0  <  j  <  N  -  i.e.,  cf  nZ  denotes  the  earliest  checkpoint 

#nZ  among  all  processes.  Given  any  basic  checkpoint 
pattern  and  laziness  Z,  we  construct  the  communication 
pattern  Vo  as  follows.3  If  cf°nZ  =  cf°nZ,  then  p,  sends  a 
message  to  every  other  process  pj  and  induces  cJ°nZ  with 
tirfnz)  ~  Fig.  4(a)  shows  an  example  of  Vo 

with  Z  =  2.  We  will  call  the  interval  between  f(cf°n_liz) 
and  l(cf°nZ)  the  induction  session  #n  which  includes  all 
the  induced  checkpoints  cj°n2. 

Since  the  induction  of  any  checkpoint  cf  nZ  (and  hence 
any  possible  dummy  checkpoints  cfy,  (n-l)Z  <  y  <  nZ) 
cannot  happen  until  the  first  checkpoint  #nZ,  say  cfnZ,  is 
taken,  pi  needs  to  take  Z  consecutive  basic  checkpoints  by 
itself  in  order  to  reach  cfnZ,  as  stated  in  Property  1. 

PROPERTY  1  If  cf  nZ  =  cfnZ,  then  the  Z  checkpoints 
cfx,  (n  —  l)Z  <  x  <  Z,  must  be  basic  checkpoints. 

By  the  construction  of  Vo,  it  is  not  hard  to  see  that,  for  any  n, 
Vo  always  has  the  earliest  cf  nZ  among  all  communication 
patterns,  given  the  basic  checkpoint  pattern.  (Formal  proofs 
can  be  found  in  the  complete  technical  report  [30].)  Hence, 
Vn  must  possess  the  largest  number  of  cf  nZ's.  Since  each 
cf  nZ  in  Vo  also  induces  the  largest  possible  number  (,V  - 
l)  of  induced  checkpoints,  the  total  number  of  induced 
checkpoints  in  Vo  must  be  the  largest  and  so  we  have  the 
following  property. 

PROPERTY  2  Given  a  basic  checkpoint  pattern,  Vo  is  the 
worst-case  communication  pattern  resulting  in  the  largest 
induction  ratio. 

Property  2  states  that,  for  the  analysis  of  worst-case  in¬ 
duction  ratio,  we  only  need  to  consider  the  communication 

3  When  it  is  clear  from  the  context  that  the  basic  checkpoint  pattern  is 
fixed,  the  same  notation  for  the  checkpoint  and  communication  pattern 
will  also  be  used  to  refer  to  the  communication  pattern. 


Figure  4:  (a)  Worst-case  communication  pattern  (b)  worst- 
case  checkpoint  and  communication  pattern. 

pattern  Vo  for  each  basic  checkpoint  pattern.  Since  every 
Vo  has  well-defined  induction  sessions  as  shown  in  Fig.  4, 
the  derivations  can  be  greatly  simplified. 

From  Property  1 ,  at  least  Z  basic  checkpoints  are  needed 
to  induce  at  most  N  -  1  checkpoints  and  so  we  have  an 
upper  bound  on  the  induction  ratio 


It  is  also  the  worst-case  induction  ratio  achievable  by  some 
Vo  for  which  an  example  with  Z  =  2  and  N  =  3  is 
shown  in  Fig.  4(b).  (The  stacked  checkpoints  indicate  that 
each  dummy  checkpoint  cf?n_,  overlaps  with  the  induced 
checkpoint  cf!jn.) 

4.2  The  Upper  Bound  under  Constraints 


much.  For  example  in  (b),  it  is  very  likely  for  po  to  take 
at  least  one  basic  checkpoint  between  t(c^%)  and  t(c?\). 
We  will  show  that  under  the  following  constraints  which 
are  satisfied  in  many  applications,  the  upper  bound  on  the 
induction  ratio  is  independent  of  N  for  Z  >  2.  (For  the 
case  of  Z  =  1,  Fig.  3(c)  demonstrates  that  the  worst-case 
induction  ratio  of  (N  -  \)/Z  =  N  —  1  is  always  achievable 
and  cannot  be  reduced.) 

Constraint-1:  Let  Q  denote  the  maximum  ratio  of  any  two 
basic  checkpoint  intervals.  Although  each  process  is 
allowed  to  take  its  basic  checkpoints  at  its  own  pace, 
Q  is  typically  bounded  by  a  small  constant  Q.  (For 
example,  Q  is  2  or  3  for  our  experiments  described  in 
the  next  section.) 

Constraint-2:  Let  L  be  the  number  of  complete  induc¬ 
tion  sessions  in  Vo.  The  applications  employing 
checkpointing  and  rollback  recovery  are  usually  long- 
running  programs,  which  implies  Z  L  is  quite  large. 
In  particular,  we  assume  Z  l  »  CQ1- 

From  Property  1,  each  induction  session  must  contain 
Z  consecutive  basic  checkpoints  and  hence  at  least  Z  -  1 
basic  checkpoint  intervals.  Let  S  denote  the  following  set 
of  integers 

S  —  {m  :  m  ■  (Z  —  1 )  >  <3  and  m  <  [Q] } . 

For  Z  >  2,  5  contains  at  least  one  element,  namely,  ("Ql  ■ 
Let  M  be  the  minimum  element  of  5.  We  define  an  M- 
session  as  consisting  of  ,W  consecutive  induction  sessions, 
session  #((n  -  1  )iV/  +  1)  through  session  #n.\l.  Our 
approach  is  based  on  the  observation  that  within  an  M- 
session,  every  process  either  takes  at  least  one  set  of  Z 
consecutive  basic  checkpoints  which  defines  one  of  the  in¬ 
duction  sessions,  or  takes  at  least  one  basic  checkpoint  due 
to  Constraint- 1.  Since,  within  an  M -session,  the  number 
of  induced  checkpoints  is  M  ■  (N  -  1)  and  the  number 
of  basic  checkpoints  is  at  least  N.  the  upper  bound  on  the 
induction  ratio  is  independent  of  .V. 

THEOREM  1  Under  the  above  two  constraints,  the  in¬ 
duction  ratio  12  <  \Q]  for  laziness  Z  >  2.  where  Q  is  the 
maximum  ratio  of  any  two  basic  checkpoint  intervals. 


The  upper  bound  in  Eq.  (1)  was  derived  under  no 
constraints  on  the  checkpoint  and  communication  pattern. 
Since  it  is  of  order  O(N),  the  induction  ratio  may  be  unac¬ 
ceptably  high  for  systems  with  a  large  number  of  processes. 
However,  a  closer  look  at  the  two  patterns  in  Fig.  4  reveals 
that  the  situation  in  (b)  which  results  in  the  worst-case  in¬ 
duction  ratio  is  less  ukely  to  happen  for  applications  where 
the  basic  checkpoint  intervals  typically  do  not  vary  'no 


Proof.  Again  we  only  have  to  consider  Vo  for  each  basic 
checkpoint  pattern.  There  are  Lm  =  [L/M  J  complete  .V/- 
sessions,  each  containing  M  ■  ( ,V  -  l )  induced  checkpoints. 
We  distinguish  the  following  two  cases. 

(a)  N  <  M:  From  Eq.  (1).  12  <  <  .V  <  M  <  fQ] . 

(b)  ,V  >  M :  First  we  consider  the  number  of  induced 
checkpoints  /.  If  Z  >  Q  +  1.  then  M  -  I  and  /  = 
L  ( N  -  1).  If  Z  <  Q  -t-  I.  7.  L  Tf-  f<V!  in  Conr;raint  1 


implies  L  »  [<31.  Since  M  <  [<3],  we  have  L/M  >  1; 
so  Lm  »  1  and  I  «  Lm  ■  M  ■  (N  —  1).  In  either  case, 
/a  Lm  ■  M  (N  -  1). 

Now  consider  the  number  of  basic  checkpoints  B.  For 


each  induction  session  #n,  the  process  p,-  with  = 


cf°n2  must  contribute  Z  basic  checkpoints  and  therefore 
the  length  of  each  induction  session  is  at  least  Z  -  1  ba¬ 
sic  checkpoint  intervals.  Within  each  M  -session,  at  least 
N  -  M  processes  do  not  contain  e^°n2  for  any  n.  By  the 
definition  of  Q,  these  N  -  M  processes  must  each  con¬ 
tribute  at  least  [— basic  checkpoints.  Therefore, 


B  >  Lm  ■  (M  •  Z  +  (N  —  M)  ■  \  — -J)and 

V 


n  =  -  < 


M  (N  -  1) 


B  ~  M  ■  Z  +  (N  -  M)  •  [Milzilj 


(2) 


Since  Z  >  1  and  —  |=il>  1  by  definition,  we  have 


72  < 


M  •  ( iV  -  1) 


M  +  (N  -  A/) 


<  A/  <  [Ql. 


as  required. 


(3) 

□ 


CombiningEq.  (l)(forZ  =  1  and  Case  (a))  and  Eq.  (2), 
we  then  define  the  refined  upper  bound,  called  the  <3-bound, 
as  follows 


Q-bound  = 


M  ■  (N  -  1) 


M  Z  +  {N>  M]  ((N  -  M)  ■  [  j) 


where  [;V  >  M]  =  1  if  ;V  >  M  is  true  and  0  otherwise. 


5  Experimental  Results 


Four  parallel  programs  written  in  the  Chare  Kernel  lan¬ 
guage  [31]  are  used  for  the  communication  trace-driven 
simulation.  The  Chare  Kernel  has  been  developed  as 
a  machine-independent  message-driven  parallel  language. 
Program  traces  used  in  this  paper  are  collected  from  an 
Encore  Multimax  510. 

The  four  programs  include  two  computer-aided  circuit 
design  applications.  Test  Generation  and  Log-.  Synthesis, 
and  two  search  applications,  Knight  Tour  and  N-Queen. 
The  execution  times  are  between  25  and  45  minutes  (see 
Table  1).  The  predetermined  minimum  basic  checkpoint  in¬ 
terval  is  chosen  to  be  2  minutes.  A  variable  Next.CP.Time 
is  initialized  to  2  minutes.  Each  process  checks  its  local 
clock  after  processing  every  100  messages.  If  the  clock 
time  exceeds  Next.CP.Time,  a  basic  checkpoint  is  inserted 
and  Next.CP.Time  is  incremented  by  2  minutes.  The  re¬ 
sulting  average  basic  checkpoint  interval  (CPI)  for  each 
program  is  listed  in  Table  1.  Before  processing  a  new 


message,  each  process  also  checks  if  it  needs  to  take  an 
induced  checkpoint,  as  described  in  Section  3.  All  reported 
numbers  are  averaged  over  five  runs. 

We  expect  the  variation  of  the  basic  checkpoint  inter¬ 
val  to  be  small  because  of  the  way  it  is  maintained.  In 
particular,  we  choose  <3  =  2  to  estimate  the  induction  ra¬ 
tio.  The  exact  value  of  Q  foi  each  program  is  listed  in 
Table  1 .  Although  Q  is  slightly  greater  than  2  for  the  first 
two  programs,  the  numbers  listed  in  the  row  of  "Under-2 
percentage”  show  that  a  very  high  percentage  of  the  ba¬ 
sic  checkpoint  intervals  are  covered  by  <3  =  2  which  thus 
serves  as  a  good  approximation.  Fig.  5  plots  the  <3-bounds 
against  the  worst-case  and  the  actual  induction  ratios  for  the 
four  programs.  It  demonstrates  that  the  Q-bound  provides  a 
good  estimate  of  the  induction  ratio.  The  large  difference  in 
the  ratio  between  Z  =  1  and  Z  >  2  confirms  that  our  gener¬ 
alization  of  the  idea  of  communication-induced  checkpoint 
coordination  as  described  in  [25]  can  significantly  reduce 
the  extra  checkpoint  overhead. 

Fig.  6  plots  the  average  rollback  distances  in  terms  of 
the  number  of  average  basic  CPIs  for  the  four  programs. 
We  use  0.5  for  Z  =  1  and  (Z  -  l)/2  for  Z  >  2  in  the 
“Estimated”  curve.  Figs.  5  and  6  illustrate  that  lazy  check¬ 
point  coordination  provides  a  flexible  trade-off  between 
coordination  overhead  and  recovery  efficiency. 

6  Summary 

We  have  proposed  the  technique  of  lazy  checkpoint  coor¬ 
dination  and  incorporated  it  into  an  uncoordinated  check¬ 
pointing  protocol  as  a  mechanism  for  bounding  rollback 
propagation.  Recovery  line  progression  is  guaranteed  by 
performing  communication-induced  checkpoint  coordina¬ 
tion  only  when  the  predetermined  consistency  criterion  is 
about  to  be  violated.  The  notion  of  laziness  has  been  in¬ 
troduced  to  provide  a  trade-off  between  extra  checkpoints 
during  normal  execution  versus  the  average  rollback  dis¬ 
tance  for  recovery.  Overhead  analysis  shows  that  the  upper 
bound  on  the  induction  ratio,  i.e.,  the  number  of  induced 
checkpoints  divided  by  the  number  of  basic  checkpoints,  is 
related  to  the  maximum  ratio  between  the  basic  checkpoint 
intervals.  Communication  trace-driven  simulation  results 
for  four  parallel  programs  showed  that  our  analysis  can 
provide  a  good  estimate  of  the  induction  ratio,  and  that 
lazy  checkpoint  coordination  can  significantly  reduce  the 
number  of  induced  checkpoints. 
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Table  1:  Execution  and  checkpoint  parameters  of  the  parallel  programs. 


Programs 

Test  Generation 

Logic  Synthesis 

Knight  Tour 

N-Queen 

Number  of  processors 

8 

6 

8 

6 

Execution  time  (sec) 

2,076 

1,736 

2,436 

1,567 

Number  of  messages 

28,219 

411,733 

104,170 

25,880 

Average  basic  CPI  (sec) 

158 

140 

132 

139 

Q 

2.17 

2.48 

1.42 

1.55 

Under-2  percentage 

99.6% 

97.0% 

100% 

100% 
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